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Ridgelets  and  their  Derivatives: 
Representation  of  Images  with  Edges 


Emmanuel  J.  Candes 


Abstract.  This  paper  reviews  the  development  of  several  recent  tools 
from  computational  harmonic  analysis.  These  new  systems  are  presented 
under  a coherent  perspective,  namely,  the  representation  of  bivariate  func- 
tions that  are  singular  along  smooth  curves  (edges).  First,  the  represen- 
tation of  functions  that  are  smooth  away  from  straight  edges  is  presented, 
and  ridgelets  will  be  shown  to  provide  near  optimal  nonlinear  approxi- 
mations to  these  objects.  Motivated  by  the  limitations  of  the  ridgelet 
methodology,  new  representation  systems,  namely,  monoscale  ridgelets 
and  curvelets  - both  of  which  use  the  ridgelet  transform  as  a building 
block  - will  be  introduced.  Curvelets  are  shown  to  provide  concrete  and 
constructive  optimal  nonlinear  approximations  to  smooth  functions  with 
twice  differentiable  singularities.  In  addition,  these  approximations  are 
obtained  simply  by  thresholding  the  curvelet  series. 


§1.  Introduction 

Throughout  the  sciences,  sparse  representations  of  classes  of  objects  are  of- 
ten sought  because  of  the  well-known  applications  of  sparsity  to  problems 
ranging  from  data  compression  and  statistical  estimation  to  feature  detec- 
tion. Indeed,  finding  sparse  representations  together  with  rapid  algorithms  to 
compute  them  is  one  of  the  main  objectives  of  a rapidly  growing  field,  com- 
putational harmonic  analysis  (CHA).  In  this  paper,  we  will  argue  that  CHA 
has  not  really  addressed  the  problem  of  efficiently  representing  smooth  mul- 
tivariate functions  with  sharp  discontinuities,  like  smooth  images  with  edges. 
Motivated  by  this  gap  in  the  literature,  we  present  a collection  of  new  rep- 
resentation tools  that  efficiently  represent  smooth  functions  that  are  singular 
along  curves.  Here,  the  tone  is  expository;  details  may  be  found  in  the  cited 
references.  In  this  paper,  attention  is  restricted  to  the  two-dimensional  situa- 
tions although  extensions  to  higher  dimensions  exist,  or  are  anticipated. 
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The  wavelet  miracle 

One  of  the  most  appealing  features  of  wavelet  systems  is  their  ability  to  provide 
efficient  representations  of  spatially  inhomogeneous  functions,  i.e.,  functions 
that  may  be  discontinuous,  spiky,  etc.  In  Mallat’s  words  “bases  of  smooth 
wavelets  are  the  best  bases  for  representing  objects  composed  of  singularities, 
when  there  may  be  an  arbitrary  number  of  singularities,  which  may  be  located 
in  all  possible  spatial  positions”  [8].  For  instance,  on  the  unit  interval  define 

/(f)  = H(t-t0)g(t),  t € [0, 1],  (1) 

where  H is  the  Heavyside  H(t)  = l{t>o}  and  g is  a smooth  arbitrary  function 
with  compact  support  and  finite  Sobolev  norm  !|<?[|w23  (see  [1]  for  the  classical 
definition  of  L2  Sobolev  norms).  Then,  the  number  of  Fourier  coefficients  of 
/ exceeding  1/n  in  absolute  value  is  bounded  below  by  c ■ n,  regardless  of 
the  degree  of  smoothness  of  / away  from  the  singular  point  to-  This  means 
that  a lot  of  different  terms  are  needed  to  obtain  good  partial  reconstructions; 
keeping  the  n largest  terms  in  the  Fourier  series  gives  only  an  L2  error  of  ap- 
proximation of  order  n-1/2.  (Throughout  the  paper,  it  will  always  be  implicit 
that  the  error  is  measured  in  the  L2  norm.)  In  contrast,  the  sparsity  of  the 
wavelet  coefficient  sequence  of  / is  in  some  sense  the  same  as  if  / were  not  sin- 
gular. In  effect,  the  number  of  wavelet  coefficients  exceeding  1/n  is  bounded 
by  Cn2/(2s+1)  giving  rates  of  approximation  of  order  n~s  corresponding  to 
the  nonlinear  bandwidth  of  W%  Sobolev  balls.  This  remarkable  adaptivity 
property  is  what  we  call  the  “wavelet  miracle.” 

The  curse 

Unfortunately,  wavelets  can  deal  with  point-like  singularities,  but  are  seriously 
challenged  by  line-like  singularities  in  dimension  two.  Let  us  for  instance 
consider  the  object 

f(xi,x2)  = H(x1cos0o  + x2sin0c  -t0)g(xltx2),  (*1, x2)  € [0,  l]2,  (2) 

where,  again,  g is  a bivariate  function  taken  from  the  Sobolev  space  Wf ; / is 
singular  on  the  line  x\  cos#o  + x2  sin  8q  = to,  but  smooth  otherwise.  Then,  the 
number  of  wavelet  coefficients  exceeding  1/n  is  now  of  the  order  n.  Hence, 
partial  n-term  wavelet  reconstructions  will  only  converge  at  a rate  n-1/2, 
regardless  of  the  almost  everywhere  degree  s of  smoothness.  The  edge  limits 
the  speed  of  convergence.  This  result  is  intuitively  not  very  surprising  as 
wavelet  bases  are  made  of  local  isotropic  oscillatory  bumps  at  various  scales, 
and  are  not  adapted  to  represent  long  elongated  structures  like  edges. 

This  clearly  raises  an  important  question:  in  two  dimensions  (and,  more 
generally,  in  arbitrary  d dimensions)  can  we  develop  a representation  enjoying 
the  same  adaptivity  features  as  wavelets  in  dimension  one? 
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§2.  Ridgelets  and  Linear  Singularities 

In  [3],  Candes  introduced  a new  tiling  of  the  frequency  plane  that  led  to  the 
construction  of  ridgelet  frames.  We  say  that  a collection  (<pn)  is  a frame  of  a 
Hilbert  space  H if  there  exist  two  constants  A,  B > 0 such  that  for  any  element 
of  H,  we  have 

n 

When  A = B,  the  frame  is  said  to  be  tight.  A collection  (<^n)  that  verifies 
the  frame  property  is  of  course  complete  and  there  is  a very  concrete  way  to 
reconstruct  / from  the  datum  of  its  coefficients  ((/,  </5„)#).  Generalities  about 
frames  can  be  found  in  [11], 

Let  ip  be  a,  univariate  oscillatory  function  and  ipjtk{t)  = 2^2ip{2H  — k). 
The  ridgelet  frame  ipj,e,k  is  a collection  of  ridge  functions  given  by 

hMO  = 4*(I£IW  - 2*  2-H)  + W + * - 2tt  2 ~H) 

in  the  frequency  domain  [3]  (S  denotes  the  dirac  distribution). 

Donoho  [9]  modified  the  ridgelet  construction  by  essentially  replacing  the 
discretization  of  the  angular  variable  with  a periodic  wavelet  transform  result- 
ing in  an  orthonormal  basis.  He  called  these  new  basis  elements  orthonormal 
ridgelets.  In  the  remainder  of  this  paper,  we  make  the  choice  of  the  orthonor- 
mal ridgelets,  although  all  the  results  and  constructions  that  follow  would 
hold  true  if  one  were  to  use  ‘pure  ridgelets.’ 

As  stated  in  [9],  such  a system  can  be  defined  as  follows:  let  (.il>j,k{t))j,kem 
be  an  orthonormal  basis  of  Meyer  wavelets  for  L2(1R)  [12],  and  let  «,<(*)>  t= 
0, . . . , 2*°  — 1;  wje(d),  i > io,  t = 0, . . . , 2'  — 1)  be  an  orthonormal  basis  for 
L2[0,27t)  made  of  periodized  Lemarie  scaling  functions  w®a  t at  level  io  and 
periodized  Meyer  wavelets  wje  at  levels  i > io.  (We  suppose  a particular 

normalization  of  these  functions.)  Let  ipjAA  denote  the  Fourier  transform 
of  ipj,k{t),  and  define  ridgelets  p\(x),  A = ( j,k;i,£,e ) as  functions  of  a:  € 1R2 
using  the  frequency-domain  definition 

MO  = + fyA- l£IX,f(0  + 7r))/2-  (3) 

Here  the  indices  run  as  follows:  j,  k € 7Z,  £ = 0, . . . , 2I_1  — 1;  i > i o,  i > j. 
Notice  the  restrictions  on  the  range  of  l and  on  i.  Let  A denote  the  set  of  all 
such  indices  A.  It  turns  out  that  (pa)aca  is  a complete  orthonormal  system 
for  L2(IR2).  Hence,  we  have  a new  decomposition  of  the  form 

/ = £</,Pa)pa. 
a 

Ridgelets  turn  out  to  be  optimal  for  representing  functions  with  linear 
singularities.  Indeed,  let  us  consider  the  template  (2).  The  following  theorem 
is  proved  in  [4]. 
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Theorem  1.  Let  g £ W2S(]R2)  and  /(aq,^)  = H(xicos90  + Z2sin0o  - 
t0)  g(x  i,z2).  Then  the  sequence  (q\  = ( f,p\ ))  of  orthonormal  ridgelet  coeffi- 
cients of  f satisfies 


#{|«a|  > 1M  < C llflllvvj 

for  some  constant  C not  depending  on  f . As  a consequence,  the  n-term 
approximation  fn  - obtained  by  keeping  the  terms  corresponding  to  the  n 
largest  coefficients  in  the  ridgelet  expansion  - satisfies 

\\f  - }n\\<Cn-^\\g\\wi. 

Hence,  the  theorem  states  that  we  obtain  a rate  of  approximation  as  if 
the  object  were  not  singular,  simply  by  thresholding  the  orthonormal  ridgelet 
expansion.  Whereas  the  singularity  caused  partial  wavelet  reconstructions  to 
converge  very  slowly,  its  effect  on  the  approximation  rate  of  truncated  ridgelet 
series  is  ‘harmless.’ 


§3.  Ridgelets  and  Curved  Edges. 

Theorem  1 considered  linear  singularities  and  it  seems  natural  to  ask  whether 
similar  results  will  hold  if  one  replaces  the  singularity  along  a straight  line 
with  one  along  an  arbitrary  curve  7.  To  simplify  our  exposition,  consider  the 
simple  case  of  a singular  function  defined  on  the  unit  square  by 

f(x  1,2:2)  = g(x i,x2)  l{x2<7(x1)}>  (4) 

where  g is  a smooth  function  and  7 is  smooth  curve.  Then  the  ridgelet  coef- 
ficient sequence  of  such  an  object  is  in  general  not  sparse: 

#{A,  |«A|  > 1/n)  > on. 

Thus,  the  speed  of  convergence  of  the  best  n-term  ridgelet  approximation  is 
only  of  order  n-1/2.  It  is  interesting  to  observe  that  the  degree  of  approxima- 
tion of  both  wavelet  and  ridgelet  partial  reconstructions  is  the  same,  although 
they  correspond  to  radically  different  systems  of  representation.  Ridgelets  are 
elongated  and  directional,  whereas  wavelets  are  isotropic  and  local. 

The  limitations  that  we  presented  in  this  section  motivate  the  refinements 
and  new  tools  that  we  are  about  to  introduce. 

§4.  Monoscale  Ridgelets 

The  approach  developed  in  this  section  builds  on  Theorem  1.  The  idea  here 
is  to  take  advantage  of  the  optimal  representation  of  linear  singularities  by 
localizing  the  ridgelets.  A detailed  exposition  is  provided  in  [5] . 

For  an  integer  s > 0 and  integers  /ci , fc2,  we  let  Q be  the  dyadic  square 
defined  by  Q = [/ci/2s,(fcj  + 1)/2S)  x [fc2/2s,  (fc2  + 1)/2S).  The  collection  of 
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all  dyadic  squares  at  scale  s will  be  denoted  by  Qs.  The  idea  is  to  smoothly 
localize  the  function  / we  wish  to  represent  near  each  of  the  dyadic  squares 
of  Qs.  We  choose  an  orthonormal  partition  of  unity  wq;  that  is,  a collection 
of  windows  such  that  Wq  is  a partition  of  unity 

E WQ  = h 

QeQs 

The  following  details  a way  of  making  up  such  an  orthonormal  partition: 
take  a C°°  univariate  window  v supported  in  [—3/4, 3/4]  such  that  v{t)  — 1 
on  [—1/2, 1/2];  define  vq  = i/(2sxi  — fcj)  u(2sx2  — k2)\  and  renormalize  the 
windows  vq  with 

wq=vq/{  E vq)1/2- 
QEQs 

It  is  then  clear  that  the  wq’s  obey  the  desired  condition. 

Define  the  rescaling  operator  Tqq  by 

TQg  = 2ag(2sx1-k1,2sx2-k2), 

which  is  an  isometry  of  L2.  Throughout  this  section,  s is  arbitrary  but  fixed. 
Monoscale  ridgelets  are  defined  as  follows:  let  p\  be  an  orthonormal  ridgelet 
basis  and  define 


i/>q,\{xi,x2)  = wq{xi,x2){Tqpx)(x1,x2); 

the  collection 

{V’Q,A,Q€Qs,A€A}  (5) 

is  what  we  call  the  monoscale  ridgelet  dictionary. 

It  is  easy  to  check  that  the  monoscale  ridgelet  dictionary  is  a tight  frame 
of  L2(1R2)  as  we  have  a Parseval  relationship 

11/111=  E E</>v^>2- 

Q€Qs  A 

Standard  arguments  show  that  we  then  have  the  decomposition 

/=  E 

QeQ,  A 

with  equality  holding  in  an  L2  sense. 

We  add  an  “extra  layer  of  coarse  scale  coefficients”  to  eliminate  various 
artifacts.  Consider  a standard  multiresolution  analysis  that  is  adapted  to 
the  unit  square  [7]  so  that  the  set  of  translates  {2s  ip{2s  ■ — k)},  k = (ki,k2), 
ki  = 0, 1, . . . , 2s  — 1 is  orthonormal.  Let  Pq  be  the  orthogonal  projector  onto 
Vg,  the  span  of  the  <p«,fc’s;  i.e., 

Pq  f • — V^s,fc)^s,fe  ■ — ^ ^ • 

k k 
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The  following  Pythagorean  relationship  holds: 

11/111  = ll^o/lli  + iK/- JPo)/|||.  (7) 

Finally,  define  the  coefficients 

aslfi  = (Rf,i>Q,x)  M = (Q,  A),  Q £ Qs,  A e A.  (8) 

Definition  1.  The  monoscale  ridgelet  transform  with  base  scale  s is  the  map- 
ping from  functions  f £ Z^®2)  to  the  amalgamation  of  coefficients  (f3s,k)  and 

(as,n)- 

Note  that  we  again  have  a partial  isometry 

ll/ll!  = £l/U2  + £KX 

k n 

thanks  to  the  Pythagorean  relationship  (7). 

Let  us  return  now  to  the  main  theme  of  this  paper,  and  study  the  ef- 
ficiency of  monoscale  ridgelets  to  represent  objects  that  are  singular  along 
curves.  Suppose  that  one  is  interested  in  constructing  an  n-term  approxima- 
tion of  the  function  / in  (4).  Without  loss  of  generality,  we  will  suppose  that 
n is  of  the  form  n = 22J+1.  We  simply  expand  / in  the  monoscale  ridgelet 
dictionary  (5)  with  s = J as  a choice  of  base  scale,  that  is,  we  define  the 
n-term  approximation  by 


fn  = Pof  + R»/2f,  (9) 

where  Rn/2f  is  the  partial  reconstruction  of  the  residual  Rf  obtained  by 
keeping  the  terms  corresponding  to  the  n/2  — 22,7  largest  coefficients 

It  is  interesting  to  observe  that  the  choice  of  the  base  scale  s of  the 
monoscale  dictionary  depends  on  the  number  n of  terms  we  wish  to  keep  in 
the  approximant.  We  have  the  following  result  [5]: 

Theorem  2.  Let  g £ W|(1R2)  and  f(x)  = t/(x)  l{l2<7(Xl)},  with  7 being 
three  times  differentiable.  Let  fn  be  the  n-term  approximation  defined  by  (9). 
Then, 

11/  - fn II 2 < c max(n-s/2,n-3/4). 

This  simple  approximation  scheme  provides  optimal  rates  of  convergence 
as  long  as  s < 3/2;  that  is,  approximation  bounds  as  if  / were  not  singular. 
In  some  sense,  one  is  allowed  to  say  that  unlike  wavelets,  ridgelets  can  be 
adapted  to  provide  efficient  representations  of  curved  singularities.  There  is 
a critical  value  s = 3/2  of  the  smoothness  parameter,  however,  beyond  which 
the  method  saturates;  as  s increases,  the  approximation  rate  is  blocked  at 
n-3/4  Nevertheless,  this  represents  already  a substantial  improvement  over 
wavelet  approximations  whose  convergence  rates  are  blocked  at  n-1^2. 
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Better  results  are  theoretically  possible.  For  instance,  let  T{C)  be  a 
model  of  smooth  images  with  twice  differentiable  edges  defined  as  follows: 

T{C)  = {/  : / satisfies  (4)  with  ||g||jy2  < C and  < C}. 

The  condition  IItIIc2  - c states  that  the  homogeneous  Holder  norm  of  order 
2 is  bounded  by  C.  In  other  words,  7 is  differentiable  and  its  first  derivative 
satisfies  the  Lipschitz  condition  |7'(u)  — 7'(v)|  < C\u  — v\.  For  this  class 
of  objects,  it  can  be  shown  that  there  are  reasonable  ways  of  constructing 
approximations  converging  at  the  rate  n_1  logn.  Monoscale  ridgelets  do  not 
attain  this  optimal  rate. 

§5.  Curvelets  and  Curved  Singularities 

The  curvelet  transform  - introduced  by  Candes  and  Donoho  in  [6]  - is  the 
last  of  the  representation  tools  that  we  will  review.  Whereas  the  monoscale 
ridgelet  transform  involved  taking  ridgelet  coefficients  with  a fixed  base  scale 
s,  the  curvelet  transform  spans  all  possible  scales  s > 0.  A useful  slogan  is  that 
the  curvelet  transform  is  obtained  by  filtering  and  then  applying  a multiscale 
ridgelet  transform.  The  muitiscale  ridgelet  dictionary  is  the  collection  of  the 
monoscale  dictionaries  at  all  possible  scales  s > 0;  i.e., 

{Vv  :=  V’Q.Ai  s > 0,  Q € Qs,  A € A}.  (10) 

The  curvelet  transform  requires  the  use  of  a sequence  of  filters  that  we 
now  describe.  Let  $0  and  s = 0, 1, 2, . . . satisfy  the  following  properties: 

• $0  is  a lowpass  filter  and  is  concentrated  at  frequencies  |£|  < 2; 

• 'f’2s  is  bandpass  and  concentrated  at  frequencies  |£|  6 [22s_1, 22s+3]; 

• the  filters  satisfy 

I*o(€)|2  + D**«)I2-1- 

8>0 

Existence  and  constructions  of  such  filters  are  well-known.  The  last  relation- 
ship implies  that  the  transformation  of  / into  a bank  of  functions 

/ (Ho/  = $0  * /,  A0 / = * /,  Ai/  = \ki  * /, . . . , As/  = $2 s * /,  • • ■) 

is  a partial  isometry  in  the  sense  that 

\\f\\l=\\Pof\\l  + Y,WAs*f\\l 

s>  0 


Equipped  with  both  a multiscale  ridgelet  dictionary  and  a sequence  of 
filters,  define  the  curvelet  coefficient  a ^ of  / by 


<*/x  = (A sf,if>Q,x),  Q e QS,X  € A. 


(11) 
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Thus,  the  coefficient  aM  is  interpreted  as  the  multiscale  ridgelet  coefficient  of 
a piece  of  / containing  information  at  frequencies  near  22s.  We  would  like  to 
point  out  that  there  is  a quadratic  scaling  relationship  between  the  scale  2s  of 
the  multiscale  ridgelet  and  the  frequency  content,  localized  around  the  corona 
of  radius  22s,  of  the  piece  that  is  analyzed.  This  relationship  is  the  key  feature 
of  the  curvelet  transform. 

We  proceed  a little  bit  differently  for  the  piece  of  / containing  information 
at  low  frequencies  Pof.  Recall  the  orthogonal  collection  of  Lemarie-Meyer 
scaling  functions  Vk(xi,x<i)  = V(xi  — ki,x^  — ^2)1  for  k = G K2.  We 

make  the  choice  of  a base  scale  so  that  Vq(£)  = 1 for  |£|  < 4/3;  and  we  make 
sure  that  the  span  of  the  translates  Vk  contains  the  range  of  the  projector 
Pof  - We  define  the  coarse  scale  curvelet  coefficients  by 

Pk  = (Pof,Vk),  ke7L2. 

It  will  be  more  convenient  to  use  a single  notation  to  index  the  set  of 
curvelet  coefficients;  the  notation  M'  will  stand  for  the  union  of  M and  k £ 
7L1.  When  p G M'  \ M,  we  let  a ^ = pk. 

Definition  2.  The  curvelet  transform  is  the  mapping  that  associates  the  co- 
efficients sequence  p € M1  to  an  arbitrary  square  integrable  function  f. 

We  will  call  curvelets  those  elements  <rM  defined  by 

Gfi  — Q G SsW  6 (12) 

with  an  obvious  modification  for  the  piece  corresponding  to  the  low  frequen- 
cies, 1 jp  = P0Vk. 

The  collection  of  curvelets  is  then  a tight  frame  for  L2  (1R2) 

11/111=  £</w„>2-  (13) 

fiEJW' 

and,  of  course,  we  have  the  decomposition 

/ = (/’ Cr'i)CrM  (I4) 

with  equality  in  an  L2  sense. 

Let  fn  be  the  truncated  n-term  curvelet  series 

fn  = aMl{|o,,|>|o|(„)}cr/J-  (15) 

HEM' 

The  following  theorem  is  proved  in  [6] . 

Theorem  3.  Let  g G W|(1R2)  and  f(x)  = g{x)  l{l2<7(i1)},  with  7 being  two 
times  differentiable.  Let  fn  be  the  n-term  approximation  (15).  Then, 

11/  - fn  ||  2 < CrT1  (log  n) 1/2 . 

Again,  we  have  a very  concrete  procedure  that  achieves  rates  of  approxi- 
mation that  cannot  be  fundamentally  improved.  A detailed  discussion  about 
the  optimality  of  this  result  is  in  [6]. 
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§6.  Conclusion 

In  this  paper,  we  presented  a connected  set  of  ideas  originating  in  the  ridgelet 
transform  and  culminating  in  the  curvelet  transform.  We  have  shown  how 
these  representations  provide  efficient  representations  of  objects  that  are  sin- 
gular along  curves.  These  tools,  however,  may  have  several  other  potential 
applications. 

Because  of  space  limitations,  we  set  aside  questions  related  to  the  prac- 
ticability of  these  new  methods.  We  would  like  to  point  out  that  fast  al- 
gorithms have  been  developed  to  implement  the  ridgelet,  monoscale  ridgelet 
and  curvelet  transform.  We  will  report  on  the  numerical  aspects  of  these 
transforms  in  a separate  paper. 
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